Skip to content

Fix UI stuck on “Disconnected” during network-change engine restart#167

Open
pappz wants to merge 7 commits intomainfrom
fix/reconnection-notification
Open

Fix UI stuck on “Disconnected” during network-change engine restart#167
pappz wants to merge 7 commits intomainfrom
fix/reconnection-notification

Conversation

@pappz
Copy link
Copy Markdown
Collaborator

@pappz pappz commented Apr 20, 2026

Title

Fix UI stuck on “Disconnected” during network-change engine restart


Summary

Fixes two issues that caused the NetBird Android app to appear “Disconnected” for up to ~27 seconds during WiFi ↔ cellular transitions, even though the engine eventually recovered.


Problem

Diagnosed from user-reported logs of a WiFi → cellular → WiFi roam:

  1. UI stuck on “Disconnected” during restart
    EngineRestarter stops and restarts the Go engine on network-type changes, but neither the restarter nor the engine reported a “Connecting” state while the restart was in progress.
    As a result, the UI only received onDisconnected and waited for onConnected, which could take 20+ seconds if the new connection stalled.

  2. Socket dials stalled (~12 seconds)
    New sockets (Signal, Management) created immediately after a WiFi switch experienced TCP SYN retransmissions via the outgoing interface.
    This happened because sockets were not pinned to the current default network.
    Calling VpnService.protect(fd) prevents VPN routing loops but does not bind the socket to a specific underlying network.

  3. Spurious restart on cold start
    ConcreteNetworkAvailabilityListener treated Android’s initial onAvailable burst (triggered right after registerNetworkCallback) as a network transition.
    This caused EngineRestarter to cancel the first login with context canceled.


Changes

  • Emit synthetic connection states during restart
    (EngineRestarter.java, EngineRunner.java)

    • Call ConnectionListener.onConnecting() when stop() is invoked and again when restart begins
    • Ensures the UI shows “Connecting” throughout the restart
    • Call onDisconnected() on restart failure or after a 30-second timeout to prevent indefinite “Connecting”
  • Bind process to current default network
    (NetworkChangeDetector.java)

    • Register registerDefaultNetworkCallback
    • Call ConnectivityManager.bindProcessToNetwork(network) on each default network change
    • Ensures all new Go sockets use the active network immediately
    • Unbind on onLost and during unregisterNetworkCallback
  • Ignore initial onAvailable burst
    (ConcreteNetworkAvailabilityListener.java)

    • Introduce a 3-second grace window after subscribe()
    • Prevents startup callback bursts from being treated as real transitions
    • Avoids unintended engine restarts on app launch
  • Update dependency

    • Bump NetBird submodule to test branch

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Fixed notification disruptions by disabling sound and vibration, and reducing notification channel importance.
  • Refactor

    • Enhanced connection state handling and improved network change detection to better manage device network transitions.
  • Tests

    • Added and updated test coverage for network availability and connection state scenarios.

pappz added 3 commits April 20, 2026 14:43
When EngineRestarter stopped and restarted the Go engine after a
network type change, the UI only saw the engine's onDisconnected
callback and had no visibility into the reconnect attempt. If the
restart stalled (e.g. on a stale management RPC), the UI stayed on
Disconnected for the full stall window, making it look like the
client never reconnected.

Emit onConnecting() from EngineRestarter at stop and at re-launch to
keep the UI in the Connecting state throughout the restart, and emit
onDisconnected() on error or the 30s safety timeout so a truly failed
restart doesn't leave the UI stuck on Connecting.
Pin the process's outgoing sockets to the current default Android
Network via ConnectivityManager.bindProcessToNetwork so fresh dials
after a WiFi/cellular switch do not stall on TCP SYN retransmits
through the departing interface.

Skip the initial onAvailable burst fired right after registering the
NetworkCallback. That burst reflects current state, not a transition,
and was triggering a spurious EngineRestarter restart that cancelled
the in-flight login on cold start.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3c6af8f3-eba6-43d0-96ca-25bcf4c626e3

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/reconnection-notification

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pappz pappz changed the title Fix/reconnection notification Fix UI stuck on “Disconnected” during network-change engine restart Apr 20, 2026
pappz added 4 commits April 20, 2026 15:40
Replace the time-based grace window with an isEngineRunning predicate.
The initial onAvailable burst that Android fires right after
registerNetworkCallback cannot trigger an EngineRestarter run because
the engine is not up yet at that point.

Tests updated accordingly; adds coverage for the engine-not-running
path.
Use IMPORTANCE_LOW and explicitly clear sound/vibration on the channel
so the persistent VPN notification does not play a sound or vibrate on
creation or each connection state update.
@pappz pappz marked this pull request as ready for review April 21, 2026 09:02
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tool/src/main/java/io/netbird/client/tool/EngineRestarter.java (1)

52-57: ⚠️ Potential issue | 🟠 Major

Remove the restart listener when the timeout fires.

After the 30s timeout, isRestartInProgress is reset and onDisconnected() is emitted, but the anonymous ServiceStateListener remains registered. If the engine stops later, that stale listener can still call runWithoutAuth() and restart after the timeout path declared failure.

🔧 Proposed fix
         timeoutCallback = () -> {
             if (isRestartInProgress) {
                 Log.e(LOGTAG, "engine restart timeout - forcing flag reset");
                 isRestartInProgress = false;
+                if (currentListener != null) {
+                    engineRunner.removeServiceStateListener(currentListener);
+                    currentListener = null;
+                }
                 notifyDisconnected();
             }
         };
@@
             public void onStarted() {
                 Log.d(LOGTAG, "engine restarted successfully");
                 isRestartInProgress = false;  // Reset flag on success
                 handler.removeCallbacks(timeoutCallback);  // Cancel timeout
                 engineRunner.removeServiceStateListener(this);
+                currentListener = null;
             }
@@
             public void onError(String msg) {
                 Log.e(LOGTAG, "restart failed: " + msg);
                 isRestartInProgress = false; // Resetting flag on error as well
                 handler.removeCallbacks(timeoutCallback);  // Cancel timeout
                 engineRunner.removeServiceStateListener(this);
+                currentListener = null;
                 notifyDisconnected();
             }

Also applies to: 65-90, 142-155

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tool/src/main/java/io/netbird/client/tool/EngineRestarter.java` around lines
52 - 57, The timeoutCallback currently resets isRestartInProgress and calls
notifyDisconnected() but leaves the anonymous ServiceStateListener registered;
update timeoutCallback to also unregister/remove that listener (the same
instance registered earlier) when the timeout fires so the stale
ServiceStateListener can't later call runWithoutAuth() and trigger another
restart; apply the same fix to the other restart-timeout handlers in the file
(the blocks around the ServiceStateListener registration in the 65-90 and
142-155 sections) ensuring you keep a reference to the ServiceStateListener
instance so you can call the appropriate remove/unregister method when
cancelling on success or timeout.
tool/src/main/java/io/netbird/client/tool/ForegroundNotification.java (1)

25-31: ⚠️ Potential issue | 🟠 Major

Use a new channel ID for silent/low-importance notification settings to affect existing installs.

When createNotificationChannel() is called with an existing channel ID, Android ignores updates to sound, vibration, and lights properties—these can only be set on initial channel creation. Importance can only be lowered if the user hasn't modified channel settings. Existing users will retain their original audible behavior unless you use a different channel ID.

Suggested fix
-        String channelId = service.getPackageName();
+        String channelId = service.getPackageName() + ".foreground.silent";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tool/src/main/java/io/netbird/client/tool/ForegroundNotification.java` around
lines 25 - 31, ForegroundNotification currently re-uses service.getPackageName()
as the NotificationChannel ID which prevents changing sound/vibration for
existing installs; change to a new, distinct channel ID (e.g., a constant like
FOREGROUND_CHANNEL_ID_SILENT or service.getPackageName() + ".fg_silent") used
when creating the NotificationChannel in the same creation block (the
NotificationChannel constructor and channel.setSound/enableVibration calls) and
update any places that build/post the foreground Notification to use that new
channel ID so the silent/low-importance settings apply to all users.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@netbird`:
- Line 1: The submodule pointer references commit
5b09078da2ac14550741ce8731e7cf4b4a62a728 which is not reachable; verify whether
that commit exists in the official netbirdio/netbird.git or a private fork, then
update the submodule to a valid commit: check the branch containing the intended
change, fetch the correct commit hash (or switch the submodule URL if it should
point to a different repo), and update the submodule reference (e.g., via git
submodule update --init --remote or by committing the corrected SHA in the
superproject and updating .gitmodules if the URL must change) so cloning with
--recurse-submodules succeeds.

In
`@tool/src/main/java/io/netbird/client/tool/networks/NetworkChangeDetector.java`:
- Around line 63-85: The onAvailable/onLost handlers in
initDefaultNetworkCallback must be guarded by an atomic/current bound-network
state and a callback-active flag so late onLost or onAvailable after unregister
don't undo a newer binding; add a field (e.g., defaultNetworkCallbackActive) set
to true before registering the defaultNetworkCallback and set to false when
unregisterNetworkCallback begins, track the currentlyBoundDefaultNetwork (or
similar) when bindProcessToNetwork(network) succeeds, and in onLost only clear
the binding if the lost network equals the currentlyBoundDefaultNetwork and
defaultNetworkCallbackActive is true; likewise ignore onAvailable if
defaultNetworkCallbackActive is false to avoid rebinding after shutdown.

---

Outside diff comments:
In `@tool/src/main/java/io/netbird/client/tool/EngineRestarter.java`:
- Around line 52-57: The timeoutCallback currently resets isRestartInProgress
and calls notifyDisconnected() but leaves the anonymous ServiceStateListener
registered; update timeoutCallback to also unregister/remove that listener (the
same instance registered earlier) when the timeout fires so the stale
ServiceStateListener can't later call runWithoutAuth() and trigger another
restart; apply the same fix to the other restart-timeout handlers in the file
(the blocks around the ServiceStateListener registration in the 65-90 and
142-155 sections) ensuring you keep a reference to the ServiceStateListener
instance so you can call the appropriate remove/unregister method when
cancelling on success or timeout.

In `@tool/src/main/java/io/netbird/client/tool/ForegroundNotification.java`:
- Around line 25-31: ForegroundNotification currently re-uses
service.getPackageName() as the NotificationChannel ID which prevents changing
sound/vibration for existing installs; change to a new, distinct channel ID
(e.g., a constant like FOREGROUND_CHANNEL_ID_SILENT or service.getPackageName()
+ ".fg_silent") used when creating the NotificationChannel in the same creation
block (the NotificationChannel constructor and channel.setSound/enableVibration
calls) and update any places that build/post the foreground Notification to use
that new channel ID so the silent/low-importance settings apply to all users.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b5d47af5-9d72-4650-b86d-dbd761803e47

📥 Commits

Reviewing files that changed from the base of the PR and between e92798a and b2d0f6d.

📒 Files selected for processing (8)
  • netbird
  • tool/src/main/java/io/netbird/client/tool/EngineRestarter.java
  • tool/src/main/java/io/netbird/client/tool/EngineRunner.java
  • tool/src/main/java/io/netbird/client/tool/ForegroundNotification.java
  • tool/src/main/java/io/netbird/client/tool/VPNService.java
  • tool/src/main/java/io/netbird/client/tool/networks/ConcreteNetworkAvailabilityListener.java
  • tool/src/main/java/io/netbird/client/tool/networks/NetworkChangeDetector.java
  • tool/src/test/java/io/netbird/client/tool/ConcreteNetworkAvailabilityListenerUnitTest.java

Comment thread netbird
@@ -1 +1 @@
Subproject commit f01c1eea6ac803fff4678abe93f95b4c965e372b
Subproject commit 5b09078da2ac14550741ce8731e7cf4b4a62a728
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify the submodule commit exists and inspect its details

# Check if submodule directory exists and fetch commit details
if [ -d "netbird" ]; then
  cd netbird
  # Verify commit exists
  git cat-file -t 5b09078da2ac14550741ce8731e7cf4b4a62a728 2>/dev/null || echo "ERROR: Commit not found"
  
  # Show commit details
  git log -1 --oneline 5b09078da2ac14550741ce8731e7cf4b4a62a728 2>/dev/null
  
  # Check which branches contain this commit
  git branch -r --contains 5b09078da2ac14550741ce8731e7cf4b4a62a728 2>/dev/null
else
  echo "Submodule directory 'netbird' not found"
fi

Repository: netbirdio/android-client

Length of output: 93


🏁 Script executed:

# Check if .gitmodules exists and shows netbird submodule configuration
if [ -f ".gitmodules" ]; then
  cat .gitmodules
else
  echo ".gitmodules not found"
fi

Repository: netbirdio/android-client

Length of output: 153


🏁 Script executed:

# Check the current netbird submodule pointer in the parent repo
git config --file .gitmodules --get-regexp netbird 2>/dev/null || echo "No netbird submodule found in .gitmodules"

Repository: netbirdio/android-client

Length of output: 161


🏁 Script executed:

# Verify the commit hash format is valid (40 hex characters)
commit_hash="5b09078da2ac14550741ce8731e7cf4b4a62a728"
if [[ $commit_hash =~ ^[0-9a-f]{40}$ ]]; then
  echo "Valid 40-character hex commit hash format"
else
  echo "Invalid commit hash format"
fi

Repository: netbirdio/android-client

Length of output: 109


🌐 Web query:

NetBird repository commit 5b09078da2ac14550741ce8731e7cf4b4a62a728

💡 Result:

No information found on commit 5b09078da2ac14550741ce8731e7cf4b4a62a728 in the NetBird repository (https://github.com/netbirdio/netbird). Extensive searches across GitHub commits, pull requests, and related pages did not surface this specific SHA. It may be from a private fork, deleted branch, or different repository. The main NetBird repo has recent activity up to April 2026 with commits like 5b85edb (management: omit proxy_protocol), fd9d430 (client: simplify entrypoint), but no match for the queried hash.

Citations:


Submodule commit is not reachable in the official NetBird repository. Verify and correct the commit hash.

The submodule is pointing to commit 5b09078da2ac14550741ce8731e7cf4b4a62a728, but this commit does not exist in the official NetBird repository (https://github.com/netbirdio/netbird.git). This will cause cloning to fail with --recurse-submodules.

Verify that:

  1. The commit exists and is accessible (may be in a private fork or needs a different URL)
  2. If this is from a test branch, confirm the correct commit hash from that branch
  3. Update the submodule pointer to a valid, reachable commit before merging
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@netbird` at line 1, The submodule pointer references commit
5b09078da2ac14550741ce8731e7cf4b4a62a728 which is not reachable; verify whether
that commit exists in the official netbirdio/netbird.git or a private fork, then
update the submodule to a valid commit: check the branch containing the intended
change, fetch the correct commit hash (or switch the submodule URL if it should
point to a different repo), and update the submodule reference (e.g., via git
submodule update --init --remote or by committing the corrected SHA in the
superproject and updating .gitmodules if the URL must change) so cloning with
--recurse-submodules succeeds.

Comment on lines +63 to +85
private void initDefaultNetworkCallback() {
defaultNetworkCallback = new ConnectivityManager.NetworkCallback() {
@Override
public void onAvailable(@NonNull Network network) {
Log.d(LOGTAG, "default network became " + network + ", binding process to it");
try {
if (!connectivityManager.bindProcessToNetwork(network)) {
Log.w(LOGTAG, "bindProcessToNetwork returned false for " + network);
}
} catch (Exception e) {
Log.e(LOGTAG, "bindProcessToNetwork failed", e);
}
}

@Override
public void onLost(@NonNull Network network) {
Log.d(LOGTAG, "default network " + network + " lost, clearing process binding");
try {
connectivityManager.bindProcessToNetwork(null);
} catch (Exception e) {
Log.e(LOGTAG, "bindProcessToNetwork(null) failed", e);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard process binding with active/current-network state.

onLost() always clears the process binding, so an old default network’s late onLost() can erase a newer binding from onAvailable(). Similarly, an in-flight onAvailable() during unregisterNetworkCallback() can re-bind the process after shutdown. Track the currently bound default network and ignore callbacks once unregister starts.

🔧 Proposed direction
+    private final Object defaultNetworkBindingLock = new Object();
+    private boolean defaultNetworkCallbackActive = false;
+    private Network boundDefaultNetwork;

     private void initDefaultNetworkCallback() {
         defaultNetworkCallback = new ConnectivityManager.NetworkCallback() {
             `@Override`
             public void onAvailable(`@NonNull` Network network) {
+                synchronized (defaultNetworkBindingLock) {
+                    if (!defaultNetworkCallbackActive) {
+                        return;
+                    }
+                    boundDefaultNetwork = network;
+                }
                 Log.d(LOGTAG, "default network became " + network + ", binding process to it");
                 try {
                     if (!connectivityManager.bindProcessToNetwork(network)) {
                         Log.w(LOGTAG, "bindProcessToNetwork returned false for " + network);
@@
             `@Override`
             public void onLost(`@NonNull` Network network) {
+                synchronized (defaultNetworkBindingLock) {
+                    if (!defaultNetworkCallbackActive || !network.equals(boundDefaultNetwork)) {
+                        return;
+                    }
+                    boundDefaultNetwork = null;
+                }
                 Log.d(LOGTAG, "default network " + network + " lost, clearing process binding");
                 try {
                     connectivityManager.bindProcessToNetwork(null);
                 } catch (Exception e) {
@@
     public void unregisterNetworkCallback() {
+        synchronized (defaultNetworkBindingLock) {
+            defaultNetworkCallbackActive = false;
+            boundDefaultNetwork = null;
+        }
         try {
             connectivityManager.unregisterNetworkCallback(networkCallback);

Also set defaultNetworkCallbackActive = true before registering the default callback, and reset it if registration fails.

Also applies to: 96-111

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tool/src/main/java/io/netbird/client/tool/networks/NetworkChangeDetector.java`
around lines 63 - 85, The onAvailable/onLost handlers in
initDefaultNetworkCallback must be guarded by an atomic/current bound-network
state and a callback-active flag so late onLost or onAvailable after unregister
don't undo a newer binding; add a field (e.g., defaultNetworkCallbackActive) set
to true before registering the defaultNetworkCallback and set to false when
unregisterNetworkCallback begins, track the currentlyBoundDefaultNetwork (or
similar) when bindProcessToNetwork(network) succeeds, and in onLost only clear
the binding if the lost network equals the currentlyBoundDefaultNetwork and
defaultNetworkCallbackActive is true; likewise ignore onAvailable if
defaultNetworkCallbackActive is false to avoid rebinding after shutdown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant